Overview

The purpose of this project is to investigate the provision of timely and effective emergency department care in Pennsylvania. The aims of the project are:
+ To compare emergency department access and ambulatory care metrics in Pennsylvania to that of other states and to national averages.
+ To compare emergency department access and ambulatory care metrics within Pennsylvania at the county level.
+ To investigate if there are any relationships between socioeconomic and demographic variables and access and care metrics using multivariate regressions.

Introduction

Access to health care means having “the timely use of personal health services to achieve the best health outcomes” (IOM, 1993). Timely and effective ambulatory care is an important cornerstone of healthcare access and is essential for good patient outcomes and satisfaction. This is especially true in the US, where the cost of healthcare is high and ever increasing, but healthcare outcomes remain poor, especially for a developed nation. It is important that care not only be evidence-based and clinically indicated, but also timely and prompt, with delivery of the “right care at the right time.”

Ambulatory care includes primary care, outpatient care and emergency room care. Ideally, patients can use primary care and outpatient for usual care, and only visit the ED for emergent conditions. Unfortunately, given the cost of healthcare and differential access to quality insurance, many patients use the ED as a ready alternative to their usual source of medical care. As such, many US adults instead visit hospital EDs with problems that could have been managed appropriately in general primary care practice.

This added strain on emergency departments across the nation has led to issues of overcrowding, long wait times and significant strain on limited resources. This is not without its repercussions, as this thereby limits the quality and timeliness of care through the ED. The frequency of ED visits is thought particularly high among people who do not have a regular source for health care or who are uninsured.

Because people are substituting the ER for regular inpatient and outpatient treatments, we need to look specifically at the ER to get a better idea of what’s happening to these groups and if ED care is meeting these needs. Thus, in this interdisciplinary project, I attempt to examine metrics of timely and effective emergency department care and how they vary in Pennsylvania.

With respect to population health, Pennsylvania falls below national averages, ranking 28th among the 50 states in 2019 according to United Health Foundation’s America’s Health Rankings. PA ranked #28 for preventable hospitalizations. As in other states across the country, measures of health status in Pennsylvania vary by geographic and socioeconomic factors which, according to the AARP Pennsylvania in 2019, are combining to restrict access to health care services in Pennsylvania. According to AARP, these structural health disparities are particularly noted among those living in rural and low-resourced areas of the state and among URM communities (Black/African American and Latino), all of whom lack access to health care and culturally appropriate, evidence-based health promoting and chronic disease management practices.

In the EM landscape, there are numerous proxies for Timely and Effective Care (TEC).
+ The goal for patients with stroke should be a door-to-head CT time within 20 minutes in ≥50% of stroke patients who may be candidates for IV tPA or mechanical thrombectomy.
+ The goal for patients with ST-elevation myocardial infarction should be a door-to-needle time within 30 minutes (for thrombolysis) and a door-to-balloon time within 90 minutes (for PCI).
+ The goal for patients with ST-elevation myocardial infarction at a non-equipped hospital to be transferred to to a specialty hospital should be a door-in-door-out time within 30 minutes.
Additionally, given the wide publicity that long wait times has attracted it has attracted (on average, >2 hours across the US), ED waiting time is also an appropriate proxy.

This is an interdisciplinary issue, as it has implications for physicians, public health advocates, epidemiologists, economists, public service advocates, etc. As such, addressing the problem of timely access to care requires a multidisciplinary approach, as I have undertaken here.This project was supported by Dr. Gary Weissman, MD, MSHP (Assistant Professor of Medicine and Senior Fellow Leonard Davies Institute) and Dr. John Holmes PhD, FACE, FACMI (Professor of Medical Informatics in Epidemiology, Associate Director of the Institute for Biomedical Informatics), who both provided invaluable advice and guidance for mapping the data and assumptions for service areas.

Data Sources

Major data sources used in this project are listed below. These sources are all publicly available.
+ “TEC data”: Timely and Effective Care (TEC) data from CMS through 2020. Contains national, county-level and hospital-level data.
+ “Census data”: US Census Bureau data available in the tidycensus R package. Using acs5 survey 5-year data ending 2019.

Methods & Results

First I investigated the TEC data: I visualized how TEC metrics vary in Pennsylvania as compared to other states regionally and nationally. Then, I visualized how TEC metrics. Then, I visualized variations of TEC metrics in Pennsylvania at the county-level using maps, highlighting the two most populous counties of Philadelphia County (with capital city of Philadelphia) and Alleghany County (with second major city of Pittsburgh). Next, I investigated the socioeconomic and demographic data, visualizing variations by county in PA as they vary across PA by county. Finally, I investigated the relationships between the TEC data and socioeconomic data using multivariate regression. In order to do this, I needed to overcome the problem that the TEC data is at the hospital level and the socioeconomic/demographic data is at the county level by converting both to zip code level data. For the TEC data, I used the hospital addresses calculate catchment/service areas around each hospital ED, assuming a 5-mile radius around hospitals based on zip code.

Preparing packages and data

All necessary packages are installed and loaded, and necessary keys are registered.

## Loading required package: knitr

Functions necessary for later processing are created.

# Theme for maps
my_theme <- function() {
  theme_minimal() +                                  
  theme(
        axis.line = element_blank(),                 
        axis.text = element_blank(),                 
        axis.title = element_blank(),
        panel.grid = element_line(color = "white"),  
        legend.key.size = unit(0.5, "cm"),          
        legend.text = element_text(size = 8),       
        legend.title = element_text(size = 10),
        plot.title = element_text(size = 14),
        panel.border = element_blank())
}

# Min for map scales
prev_min <- function(x) {
  min(c(x),na.rm=TRUE)
}

# Max for map scales
prev_max <- function(x) {
  max(c(x), na.rm=TRUE)
}

# Theme for bar graphs
my_theme_bar <- function() {
  theme_minimal() +                                  
  theme(
        axis.line = element_blank(),                 
        panel.grid = element_line(color = "white"),  
        legend.key.size = unit(0.5, "cm"),          
        legend.text = element_text(size = 8),       
        legend.title = element_text(size = 10),
        plot.title = element_text(size = 14),
       panel.border = element_blank())
}

# Percent function
percent <- function(x, format = "f"){ 
  paste0(formatC(x, format = format, digits = 2), "%")
}

# Function to extract legend for shared graphs
g_legend<-function(a.gplot){
  tmp <- ggplot_gtable(ggplot_build(a.gplot))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  return(legend)}

Necessary inputs are downloaded and/or read.
+ TEC data is downloaded directly from the CMS website.
+ Counties map data from the tigris package.
+ 2019 County population data downloaded from US Census Bureau County Population Totals 2010-2019.

# TEC data
tec_hosp <- read.csv(file = "https://data.cms.gov/provider-data/sites/default/files/resources/350f34f9ef3d484925d49dfcce7a0f54_1632355520/Timely_and_Effective_Care-Hospital.csv")
tec_state <- read.csv(file = "https://data.cms.gov/provider-data/sites/default/files/resources/0b2db52e1ce72942f369cc00c00649f8_1632355520/Timely_and_Effective_Care-State.csv")
tec_natl <- read.csv(file = "https://data.cms.gov/provider-data/sites/default/files/resources/4766dbf7fd2de19618a27f5546954463_1632355520/Timely_and_Effective_Care-National.csv")

# PA counties data
pa_counties<-tigris::counties(state = "42", cb = TRUE, progress_bar = FALSE) #FIPS 42 is Pennsylvania

# PA counties population data
pa_pop<- readxl::read_xlsx("/Users/moniquearnold/Documents/PSOM/BMIN 503/Final Project/BMIN503_Final_Project/Data/co-est2019-annres-42.xlsx",range="A6:M72",
   col_names = c("County","Census","Estimate Base",
                 "pop_2010",
                 "pop_2011",
                 "pop_2012",
                 "pop_2013",
                 "pop_2014",
                 "pop_2015",
                 "pop_2016",
                 "pop_2017",
                 "pop_2018",
                 "pop_2019"))


# Cleaning data
tec_state$NAME <- state.name[match(tec_state$State,state.abb)]
tec_state$Score<- as.numeric(tec_state$Score)

pa_pop$County<- gsub('^\\.|\\.$', '', pa_pop$County) 
pa_pop$NAME<- gsub(" County[,%] Pennsylvania$","", pa_pop$County)

Visualizing national data

First, national data is visualized using us_states dataset provided by the spData package. Note: ‘total_pop_15’ is the numerical vector of total population in 2015. For simplicity only the contiguous United States is considered.

Metrics investigated are:
+ OP_18b: Average (median) time patients spent in the ED
+ OP_18c: Average (median) time patients spent in the ED (Psychiatric/Mental Health Patients)
+ OP_2: Percentage of outpatients with chest pain or possible heart attack who got drugs to break up blood clots within 30 minutes of arrival
+ OP_3b: Percentage of patients who came to the ED with stroke symptoms who received brain scan results within 45 minutes of arrival
+ OP_23: Average (median) number of minutes before outpatients with chest pain or possible heart attack who needed specialized care were transferred to another hospital

# Select variables to investigate
varlist <- c("OP_18b","OP_18c","OP_2","OP_23","OP_3b")

state_vars <- tec_state %>% 
   dplyr::select(NAME,State,Measure.ID,Score) %>%
   filter(Measure.ID %in% varlist) %>% 
   drop_na() %>%
   reshape(idvar = c("State","NAME"), timevar = "Measure.ID", direction = "wide", varying=varlist)

# Merge to usa map data
usa <- us_states
to_map_natl<- merge(usa,state_vars, stringsAsFactors=FALSE)
str(to_map_natl) #check
## Classes 'sf' and 'data.frame':   48 obs. of  13 variables:
##  $ NAME        : chr  "Alabama" "Arizona" "Arkansas" "California" ...
##  $ GEOID       : chr  "01" "04" "05" "06" ...
##  $ REGION      : Factor w/ 4 levels "Norteast","Midwest",..: 3 4 3 4 4 1 3 3 3 4 ...
##  $ AREA        : Units: [km^2] num  133709 295281 137690 409747 269573 ...
##  $ total_pop_10: num  4712651 6246816 2872684 36637290 4887061 ...
##  $ total_pop_15: num  4830620 6641928 2958208 38421464 5278906 ...
##  $ State       : chr  "AL" "AZ" "AR" "CA" ...
##  $ OP_18b      : num  139 174 126 163 140 163 194 152 145 129 ...
##  $ OP_18c      : num  213 291 201 269 234 278 300 225 289 181 ...
##  $ OP_2        : num  51 69 38 48 60 33 NA 54 39 33 ...
##  $ OP_23       : num  62 72 73 73 73 75 68 75 70 62 ...
##  $ OP_3b       : num  74 71 62 78 56 64 NA 56 72 62 ...
##  $ geometry    :sfc_MULTIPOLYGON of length 48; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:51, 1:2] -88.2 -88.2 -87.4 -86.9 -85.6 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
##   ..- attr(*, "names")= chr [1:12] "NAME" "GEOID" "REGION" "AREA" ...
to_map_northeast<- to_map_natl %>% filter(REGION=="Norteast")
to_map_pa <- to_map_natl %>% filter(NAME=="Pennsylvania")

# Plot maps
#OP_18b
map_natl_OP_18b<-ggplot() + 
   geom_sf(data = to_map_natl, aes(fill = OP_18b), lwd=0, colour="white") + 
   geom_sf(data = to_map_pa, fill = NA, color = "red", size = 0.8) +
   coord_sf() + 
   my_theme() + 
   scale_fill_viridis(discrete=FALSE, name = "Time (mins)", option = "mako", direction = -1,
                         limit = range(prev_min(to_map_natl$OP_18b), prev_max(to_map_natl$OP_18b))) +
   labs(title= "Fig. 1: Average (median) time patients spent in the ED",
        subtitle= "July - December 2020",
        caption = "Source: Centers for Disease Control. Pennsylvania state is highlighted.") +
   theme(legend.justification = c(0, 1),legend.position = c(0, 0.35)) +
   geom_text(aes(x = -85, y = 35, hjust="left", 
                 label = "Pennslvania median = 156 mins \nUS national median = 148 mins"), stat = "unique")


map_natl_OP_18c<-ggplot() + 
   geom_sf(data = to_map_natl, aes(fill = OP_18c), lwd=0) + 
   geom_sf(data = to_map_pa, fill = NA, color = "red", size = 0.8) +
   coord_sf() + 
   my_theme() + 
   scale_fill_viridis(discrete=FALSE, name = "Time (mins)", option = "mako", direction = -1,
                         limit = range(prev_min(to_map_natl$OP_18c), prev_max(to_map_natl$OP_18c))) +
   labs(title= "Fig. 2: Average (median) time patients spent in the ED \n(Psychiatric/Mental Health Patients)",
        subtitle= "July - December 2020",
        caption = "Source: Centers for Disease Control. Pennsylvania state is highlighted.") +
   theme(legend.justification = c(0, 1),legend.position = c(0, 0.35)) +
   geom_text(aes(x = -85, y = 35, hjust="left", 
                 label = "Pennslyvania median = 263 mins \nUS national median = 248 mins"), stat = "unique")

map_natl_OP_2<-ggplot() + 
   geom_sf(data = to_map_natl, aes(fill = OP_2), lwd=0) + 
   geom_sf(data = to_map_pa, fill = NA, color = "red", size = 0.8) +   
   coord_sf() + 
   my_theme() + 
   scale_fill_viridis(discrete=FALSE, name = "% of patients", option = "magma", direction = -1,
                         limit = range(prev_min(to_map_natl$OP_2), prev_max(to_map_natl$OP_2))) +
   labs(title= "Fig. 3: Percentage of outpatients with chest pain or possible heart \nattack who got drugs to break up blood clots within 30 minutes of arrival",
        subtitle= "July - December 2020",
        caption = "Source: Centers for Disease Control. Pennsylvania state is highlighted.") +
   theme(legend.justification = c(0, 1),legend.position = c(0, 0.35)) +
   geom_text(aes(x = -85, y = 35, hjust="left", 
                 label = "Pennslyvania median = 32% \nUS national median = 52%"), stat = "unique")   

map_natl_OP_23<-ggplot() + 
   geom_sf(data = to_map_natl, aes(fill = OP_23), lwd=0) +
   geom_sf(data = to_map_pa, fill = NA, color = "red", size = 0.8) +
   coord_sf() + 
   my_theme() + 
   scale_fill_viridis(discrete=FALSE, name = "% of patients", option = "cividis", direction = 1,
                         limit = range(prev_min(to_map_natl$OP_23), prev_max(to_map_natl$OP_23))) + 
   labs(title= "Fig. 4: Percentage of patients who came to the ED with stroke symptoms \nwho received brain scan results within 45 minutes of arrival",
        subtitle= "July - December 2020",
        caption = "Source: Centers for Disease Control. Pennsylvania state is highlighted.") +
   theme(legend.justification = c(0, 1),legend.position = c(0, 0.35)) +
   geom_text(aes(x = -85, y = 35, hjust="left", 
                 label = "Pennslyvania median = 65% \nUS national median = 72%"), stat = "unique")  

map_natl_OP_3b<-ggplot() + 
   geom_sf(data = to_map_natl, aes(fill = OP_3b), lwd=0) +
   geom_sf(data = to_map_pa, fill = NA, color = "red", size = 0.8) +
   coord_sf() + 
   my_theme() + 
   scale_fill_viridis(discrete=FALSE, name = "Time (mins)", option = "magma", direction = -1,
                         limit = range(prev_min(to_map_natl$OP_3b), prev_max(to_map_natl$OP_3b))) +
   labs(title= "Fig. 5: Average (median) number of minutes before outpatients with chest pain \nor possible heart attack who needed specialized care were transferred to \nanother hospital",
        subtitle= "July - December 2020",
        caption = "Source: Centers for Disease Control. Pennsylvania state is highlighted.") +
   theme(legend.justification = c(0, 1),legend.position = c(0, 0.35)) +
   geom_text(aes(x = -85, y = 35, hjust="left", 
                 label = "Pennslyvania median = 76 mins \nUS national median = 61 mins"), stat = "unique")  

map_natl_OP_18b

map_natl_OP_18c

map_natl_OP_2

map_natl_OP_23

map_natl_OP_3b

Comparing Pennsylvania regionally

Next, median metrics of Pennsylvania are compared to those of other states in the Northeast.
Note: Since only the final median values and not the underlying data, I cannot perform any statistical comparisons.

# Determine averages
fg_northeast<- as.data.frame(to_map_natl) %>% 
   filter(REGION=="Norteast") %>%
   dplyr::select(NAME,OP_18b,OP_18c,OP_2,OP_23,OP_3b)

# Reshape for barchart
forgraph_northeast <- melt(setDT(fg_northeast), id.vars = c("NAME"), variable.name = "score")

# Plot graphs
bar_OP_18b18c <- forgraph_northeast %>% 
   filter(variable=="OP_18b" | variable=="OP_18c") %>%
   ggplot(aes(x=reorder(factor(NAME),value),  y=value, fill=variable)) + 
   geom_bar(stat="identity", colour = "black", position = "stack") + 
   my_theme_bar() + 
   theme(axis.title.y=element_blank()) +
   geom_text(aes(label=value), vjust=1, color="white", size=3.5) +
   scale_y_continuous(name="Median Time (mins)") + 
   coord_flip() + 
   labs(title= "Fig. 6: Average (median) time patients spent in the ED",
        subtitle= "July - December 2020",
        caption = "Source: Centers for Disease Control. Pennsylvania state is highlighted.") +
   scale_fill_brewer(palette = "Set1",
                     breaks=c("OP_18b", "OP_18c"),
                     labels=c("All Patients", "Psychiatric/Mental Health Patients"))  + 
   theme(legend.title=element_blank())

bar_OP_2_23 <- forgraph_northeast %>% 
   filter(variable=="OP_2" | variable=="OP_23") %>%
   ggplot(aes(x=reorder(factor(NAME),value),  y=value, fill=variable)) + 
   geom_bar(stat="identity", colour = "black", position = "stack") + 
   my_theme_bar() + 
   theme(axis.title.x=element_blank()) +
   geom_text(aes(label=value), vjust=1, color="white", size=3.5) +
   scale_y_continuous(name="% of patients") + 
   labs(title="Fig. 7: Percent of ED Patients Presenting with Acute Symptoms who \nReceived Timely Care ",
        subtitle= "July - December 2020",
        caption = "Source: Centers for Disease Control. Pennsylvania state is highlighted.") +
   scale_fill_brewer(palette = "Accent",
                     breaks=c("OP_2", "OP_23"),
                     labels=c("% pts with chest pain or possible heart attack \nwho got drugs to break up blood clots within 30 minutes of arrival", "% pts with stroke symptoms who received \nbrain scan results within 45 minutes of arrival"))  + 
   theme(legend.title=element_blank(), legend.position="bottom")


bar_OP_18b18c

bar_OP_2_23

Investigating Pennsylvania hospital-specific data

A brief look at the wait times for the hospitals.

# Subset data to metrics of interest and PA
df_hosp <- tec_hosp %>% filter(Measure.ID %in% varlist & State=="PA")

## General hospitals
# Get list of general hospital names
df_hosp_fac <- df_hosp %>% 
   mutate(addr=paste0(Address,", ",City, ", ",State," ",ZIP.Code)) %>%
   distinct(Facility.Name,addr,Measure.ID,Score) %>% 
   arrange(Facility.Name) %>%
   reshape(idvar = c("Facility.Name","addr"), timevar = "Measure.ID", direction = "wide", varying=varlist)

df_hosp_fac$OP_18b <- as.numeric(df_hosp_fac$OP_18b)
df_hosp_fac$OP_18c <- as.numeric(df_hosp_fac$OP_18c)
df_hosp_fac$OP_2 <- as.numeric(df_hosp_fac$OP_2)
df_hosp_fac$OP_23 <- as.numeric(df_hosp_fac$OP_23)
df_hosp_fac$OP_3b <- as.numeric(df_hosp_fac$OP_3b)

df_hosp_fac$addr <- gsub('34TH ST &', '3400', df_hosp_fac$addr)
df_hosp_fac$addr <- gsub('34TH &', '3400', df_hosp_fac$addr)

# Format table
dt_pahosp <- df_hosp_fac %>% dplyr::select(Facility.Name,OP_18b,OP_18c) %>% filter(! is.na(OP_18b) | ! is.na(OP_18c)) 
dt_pahosp$Facility.Name<-str_to_title(dt_pahosp$Facility.Name) 
dt_pahosp$OP_18b_time <- dt_pahosp$OP_18b %>% hms::as_hms() %>% as.POSIXlt() %>% format("%M:%S")
dt_pahosp$OP_18c_time <- dt_pahosp$OP_18c %>% hms::as_hms() %>% as.POSIXlt() %>% format("%M:%S")

dt_pahosp <- dt_pahosp %>% dplyr::rename('Hospital'=Facility.Name,
                                 'Time Spent in ED - All Patients'=OP_18b_time,
                                 'Time Spent in ED - Mental Health Patients'=OP_18c_time) 

# Calculate means
dt_pahosp_means <- dt_pahosp %>% dplyr::summarize(mean_wait_all=mean(OP_18b, na.rm = TRUE),
                                                  mean_wait_bhv=mean(OP_18c, na.rm = TRUE))

dt_pahosp_means$`All Patients` <- dt_pahosp_means$mean_wait_all %>% 
   hms::as_hms() %>% as.POSIXlt() %>% format("%M:%S")
dt_pahosp_means$`Mental Health Patients` <- dt_pahosp_means$mean_wait_bhv %>% 
   hms::as_hms() %>% as.POSIXlt() %>% format("%M:%S")
dt_pahosp_means <- dt_pahosp_means %>% dplyr::select(-mean_wait_all,-mean_wait_bhv) %>% gather()


# Final tables
head(dt_pahosp[order(dt_pahosp$`Time Spent in ED - All Patients`,decreasing=TRUE),],n=10) %>%
   dplyr::select(-OP_18b,-OP_18c) %>%
   kable(caption="Table 1. Longest Patient Wait Times at PA Hospitals (Top 10)", 
        align="lcc",
        row.names = FALSE) %>%
   kable_styling(fixed_thead = T, bootstrap_options = c("hover", "condensed"), full_width = F)  %>%
   row_spec(3:5, bold = T,  background = "yellow")
Table 1. Longest Patient Wait Times at PA Hospitals (Top 10)
Hospital Time Spent in ED - All Patients Time Spent in ED - Mental Health Patients
Jefferson Health- Northeast 06:26 NA
Milton S Hershey Medical Center 04:54 08:24
Hospital Of Univ Of Pennsylvania 04:11 04:44
Chester County Hospital 04:05 10:08
Penn Presbyterian Medical Center 03:57 04:34
Robert Packer Hospital 03:44 06:07
Upmc Memorial 03:44 04:24
Penn State Health St. Joseph 03:42 04:53
Main Line Hospital Lankenau 03:38 NA
Einstein Medical Center Montgomery 03:37 NA
tail(dt_pahosp[order(dt_pahosp$`Time Spent in ED - All Patients`,decreasing=TRUE),],n=10) %>%
   dplyr::select(-OP_18b,-OP_18c) %>%
   kable(caption="Table 2. Shortest Patient Wait Times at PA Hospitals (Bottom 10)", 
        align="lcc",
        row.names = FALSE) %>%
   kable_styling(fixed_thead = T, bootstrap_options = c("hover", "condensed"), full_width = F)
Table 2. Shortest Patient Wait Times at PA Hospitals (Bottom 10)
Hospital Time Spent in ED - All Patients Time Spent in ED - Mental Health Patients
Millcreek Community Hospital 01:36 03:28
Ahn Emerus Westmoreland, Llc 01:31 NA
Penn Highlands Tyrone 01:31 NA
Highlands Hospital 01:30 04:05
Upmc Kane 01:30 03:02
Mercy Catholic Medical Center- Mercy Fitzgerald 01:28 04:44
Lecom Health Corry Memorial Hospital 01:24 NA
Grove City Medical Center 01:22 NA
Bucktail Medical Center 01:19 NA
Conemaugh Meyersdale Medical Center 01:08 NA
dt_pahosp_means %>%
  kable(caption="Table 3. Mean Patient Wait Times at PA Hospitals ", 
        align="lr",
        row.names = FALSE,
        col.names=NULL) %>%
   kable_styling(fixed_thead = T, bootstrap_options = c("hover", "condensed"), full_width = F)
Table 3. Mean Patient Wait Times at PA Hospitals
All Patients 02:40
Mental Health Patients 05:03

Investigating hospital distribution in Pennsylvania

Next, I explore the availability of hospitals across Pennsylvania. I will map hospital densities using the hospital data from TEC.I will map all hospitals, then hospitals across a few key service lines, namely: pediatric hospitals, trauma centers and stroke centers.

Using the package ggmap, I will get latitude and longitude data for each hospital address in the TEC data using the Google API. To do this, I registered for a free Google console account, generated an API key that I registered in R Studio, and enabled the following APIs on the Google Console:
+ Directions API
+ Geocoding API
+ Geolocation API
+ Places API
+ Maps Javascript AVI
+ Maps Embed API

I then converted the data frame of resultant cartesion coordinates to an sf object. I used the the projection WGS84, which is the CRS code #4326, a priori defined in the sf object. I mapped the hospitals onto a map of Pennsylvania using the counties dataset tigris library, overlayed with county population data.

## All hospitals
   # getting map coordinates (lat and long) of hospital addresses 
   for(i in 1:nrow(df_hosp_fac))
   {
       ifelse(is.na(df_hosp_fac$addr[i]), NA,      
       hosp_ggmap_pa <- geocode(df_hosp_fac$addr[i], 
                                output = "more", 
                                source = "google", 
                                messaging = FALSE))
             df_hosp_fac$Longitude[i] <- as.numeric(hosp_ggmap_pa[1])
             df_hosp_fac$Latitude[i] <- as.numeric(hosp_ggmap_pa[2])
   }
   
   df_hosp_fac_tib <- as_tibble(df_hosp_fac)
   df_hosp_fac_sf <- st_as_sf(df_hosp_fac_tib, coords = c("Longitude", "Latitude"), crs = 4326)
   mapview(df_hosp_fac_sf) #Quickly checking addresses geocoded correctly
   # merge to PA county data to PA population data
   pa_tomap<-merge(pa_counties, pa_pop, by="NAME")
   class(pa_tomap) #check
## [1] "sf"         "data.frame"
## Specialty hospitals 
varlist_hosp <- c("Childrens","Stroke","Trauma")

for (var in varlist_hosp){
   
   # Initialize table
   tablet <- as.character(paste0("list_pa_",var))
   
   # Import
   list <- readxl::read_xlsx("/Users/moniquearnold/Documents/PSOM/BMIN 503/Final Project/BMIN503_Final_Project/Data/Specialty Hospitals.xlsx", sheet=var)
   
   # Getting map coordinates (lat and long) of specialty hospital addresses 
   list2 <- cbind(list, 
                  type_hosp=var,
                  geocode(list$Hospital, 
                          output = "more", 
                          source = "google", 
                          messaging = FALSE))
   
   list_tib <- as_tibble(list2)
   list_sf <- st_as_sf(list_tib, coords = c("lon", "lat"), crs = 4326)
   
   # Assign final table
   assign(tablet,list_sf)
   
}

# checking addresses geocoded correctly
# mapview(list_pa_Childrens)
# mapview(list_pa_Stroke)
# mapview(list_pa_Trauma)

spec_sf <- st_as_sf(rbind.fill(list_pa_Childrens,list_pa_Stroke,list_pa_Trauma)) %>% dplyr::select(-'...3')
class(spec_sf)
## [1] "sf"         "data.frame"
# Creating annotations for graphs
high_pa_tomap <- pa_tomap %>% filter(pop_2019 > 1000000) #labelling counties with pop > 1 million

for(i in 1:nrow(high_pa_tomap)){
    ifelse(is.na(high_pa_tomap$County[i]), NA,      
    high_pa_tomap2 <- geocode(high_pa_tomap$County[i], output = "latlon", source = "google", messaging = FALSE))
          high_pa_tomap$lon[i] <- as.numeric(high_pa_tomap2[1])
          high_pa_tomap$lat[i] <- as.numeric(high_pa_tomap2[2])
}

all_hosp_sf <- st_as_sf(rbind.fill(spec_sf,df_hosp_fac_sf) %>% 
   mutate(type_hosp = ifelse(is.na(type_hosp),"All Hospitals",type_hosp)))


# Maps
ggplot() + 
   geom_sf(data = pa_tomap, aes(fill = pop_2019/100000), color="white") + 
   geom_sf(data = df_hosp_fac_sf, size = 2, shape = 21, fill = "gold") +
   geom_sf_label_repel(data = high_pa_tomap, aes(x=lon,y=lat,label=NAME),
                       size=3, force=7, nudge_y=-6, nudge_x=1, seed = 20,
                       alpha=0.8) +
   coord_sf() + 
   my_theme() + 
   scale_fill_viridis(discrete=FALSE, name = "Population (in 100,000s)", option = "mako", direction = -1,
                         limit = range(prev_min(pa_tomap$pop_2019/100000), 
                                       prev_max(pa_tomap$pop_2019/100000))) +
   labs(title= "Fig. 8: Pennsylvania Specialty Service Line Hospitals",
        caption = "Source: Centers for Disease Control, US Census Bureau (2019 Census).") +
   theme(legend.justification = c(0, 1),legend.position = "bottom")

ggplot() + 
   geom_sf(data = pa_tomap, aes(fill = pop_2019/100000), color="white") + 
   geom_sf(data = spec_sf, size = 2, shape = 21, fill = "gold") +
   # geom_sf_label_repel(data = high_pa_tomap, aes(x=lon,y=lat,label=NAME), 
   #                     size=3, force=7, nudge_y=-6, nudge_x=1, seed = 20,
   #                     alpha=0.8) +
   facet_wrap(~type_hosp, shrink=FALSE, dir="v", nrow=1) +
   coord_sf() + 
   my_theme() + 
   scale_fill_viridis(discrete=FALSE, name = "Population (in 100,000s)", option = "mako", direction = -1,
                         limit = range(prev_min(pa_tomap$pop_2019/100000), 
                                       prev_max(pa_tomap$pop_2019/100000))) +
   labs(title= "Fig. 9: Pennsylvania Specialty Service Line Hospitals",
        caption = "Source: Centers for Disease Control, US Census Bureau (2019 Census).") +
   theme(legend.justification = c(0, 1),
         legend.position = "bottom",
         legend.text = element_text(size = 6))

Investigating socioeconomic/demographic differences by county

Next, I will investigate different key socioeconomic/demographic factors on average and by county in PA using the tidycensus package and 5 year ACS data ending in 2019. Key factors investigated are:
+ Poverty rate (DP03_0119PE)
+ Median household income (DP03_0062E)
+ Percentage population 25 years and over with high school grad or higher (DP02_0066PE)
+ Percentage population without health insurance coverage (DP03_0099PE)
+ Percentage population with public health insurance coverage (DP03_0098PE)
+ Percentage population with private health insurance (DP03_0097PE)
+ Percentage of population White (DP05_0037PE)
+ Percentage of population Asian (DP05_0044PE)
+ Percentage of population Black or African American (DP05_0038PE)
+ Percentage of population Hispanic or Latino (DP05_0078PE)
+ Percentage of population American Indian and Alaska Native (DP05_0039PE)

# Obtaining 2019 acs5 variable list to investigate
dp17 <- load_variables(2019, "acs5/profile", cache = TRUE)

# Obtaining COUNTY-LEVEL socioeconomic/demographic data
pa_factors <- get_acs(geography = "county", 
              variables = c(povertyrate = "DP03_0119P",
                            medincome = "DP03_0062",
                            edurate = "DP02_0067P",
                            unemploymentrate = "DP03_0009P",
                            nohealthins = "DP03_0099P",
                            pubhealthins = "DP03_0098P",
                            privhealthins = "DP03_0097P",
                            racewhite = "DP05_0077P",
                            raceasian = "DP05_0080P",
                            raceblack = "DP05_0078P",
                            raceamerind = "DP05_0079P",
                            racehawaii = "DP05_0081P",
                            racehispanic = "DP05_0071P",
                            raceother = "DP05_0082P"),
              state = 42, #PA is 42 
              year = 2019,
              geometry=TRUE)

unique(pa_factors$variable)
##  [1] "edurate"          "unemploymentrate" "medincome"        "privhealthins"    "pubhealthins"     "nohealthins"      "povertyrate"      "racehispanic"     "racewhite"        "raceblack"        "raceamerind"      "raceasian"        "racehawaii"       "raceother"
tableout <- data.frame(cbind(unique(pa_factors$variable),
                  c("Percentage of Population 25y+ with High Diploma or Equivalency",
                    "Unemployment Rate (%)",
                    "Median Household Income (in millions of USD)",
                    "Percentage of population with private health insurance",
                    "Percentage of population with public health insurance coverage",
                    "Percentage of population without health insurance coverage ",
                    "Poverty Rate (%)",
                    "Ethnciticy -- % Hispanic or Latino",
                    "Race -- % White", 
                    "Race -- % Black or African American",
                    "Race -- % American Indian and Alaska Native",
                    "Race -- % Asian", 
                    "Race -- % Native Hawaiian and Other Pacific Islander",
                    "Race -- % Other Race")))

table_soc <- data.frame(pa_factors) %>% 
   group_by(variable) %>%
   dplyr::select(NAME,variable, estimate) %>% 
   mutate(estimate = as.numeric(estimate),
          variable = as.character(variable)) %>%
   dplyr::summarize(mean_estimate=mean(estimate, na.rm = TRUE)) %>%
   mutate(mean_estimate = ifelse(variable %in% c("medincome"), 
                             scales::dollar(mean_estimate), 
                             percent(mean_estimate))) %>%
   left_join(tableout,by=c("variable"="X1")) %>%
   dplyr::rename(`Socioeconomic/Demographic Factor`=X2,
                 `Mean Value`=mean_estimate)


table_soc %>%
   filter(variable %in% c("racewhite","raceblack","raceasian","racehawaii","raceamerind","raceother","raceracehispanic","povertyrate","unemploymentrate","medianincome","nohealthins","edurate")) %>%
   mutate(variable = factor(variable,levels=c("racewhite","raceblack","raceasian","racehawaii","raceamerind","raceother","raceracehispanic","povertyrate","unemploymentrate","medianincome","nohealthins","edurate"))) %>%
   dplyr::select(`Socioeconomic/Demographic Factor`,`Mean Value`) %>%
  kable(caption="Table 4. Socioeconomic and Demographic Factors Across PA Counties", 
        align="lr", 
        digits = 3,  
        col.names=c("Factor","Mean Value"))  %>%
   kable_styling(fixed_thead = T, bootstrap_options = c("hover", "condensed"), full_width = F)
Table 4. Socioeconomic and Demographic Factors Across PA Counties
Factor Mean Value
Percentage of Population 25y+ with High Diploma or Equivalency 89.89%
Percentage of population without health insurance coverage 5.91%
Poverty Rate (%) 8.31%
Race – % American Indian and Alaska Native 0.10%
Race – % Asian 1.50%
Race – % Black or African American 4.44%
Race – % Native Hawaiian and Other Pacific Islander 0.01%
Race – % Other Race 0.09%
Race – % White 87.87%
Unemployment Rate (%) 5.04%

Mapping socioeconomic/demographic variables.

# Mapping variables

tableout_econ <- tableout %>% filter(X1 %in% c("edurate","unemploymentrate","povertyrate","medincome","nohealthins"))
table_soc_econ <- table_soc %>% filter(variable %in% c("edurate","unemploymentrate","povertyrate","medincome","nohealthins"))
pa_factors_econ <- pa_factors %>% filter(variable %in% c("edurate","unemploymentrate","povertyrate","medincome","nohealthins"))

counter = 0 
#for(i in unique(pa_factors$variable)){
for(i in unique(table_soc_econ$variable)){
   # dynamic figure labelling
   counter <- counter + 1
   figletter <- toupper(letters[counter])
   
   # dynamic titling
   tableout2 <- tableout_econ %>% filter(X1==i)
   titleout <- as.character(paste0("Fig. 10",figletter,". Pennsylvania ",tableout2$X2," by County"))
   
   # dynamic annotations
   anno_soc <- table_soc_econ %>% filter(variable==i)
   meanlabel <- 
      paste0("Calculated Estimated Mean in PA: ",anno_soc$`Mean Value`)

   # subset data for variable of interest
   spec_table_econ <- pa_factors_econ %>% filter(variable==i)
   var_pa_tomap <- spec_table_econ %>% 
      filter(NAME=="Allegheny County, Pennsylvania" | NAME=="Philadelphia County, Pennsylvania") %>%
      mutate(label=paste0(gsub(' County, Pennsylvania','',NAME),": ",estimate)) %>% 
      st_join(high_pa_tomap)

   # final maps
   demo_map <- ggplot() + 
      geom_sf(data = spec_table_econ, aes(fill = estimate), color="white") + 
      geom_sf_label_repel(data = var_pa_tomap, aes(x=lon,y=lat,label=label),
                          size=3, force=7, nudge_y=-5, nudge_x=1, seed = 10,
                          fill = alpha(c("white"),0.8)) +
      coord_sf() + 
      my_theme() + 
      scale_fill_viridis(discrete=FALSE, option = "rocket", direction = -1,
                            limit = range(prev_min(spec_table_econ$estimate),
                                          prev_max(spec_table_econ$estimate))) +
      labs(title = paste0(strwrap(titleout,width=50),collapse="\n"),
           caption = "Source: US Census Bureau (2019 Census).",
           subtitle = paste0(strwrap(meanlabel,width=50),collapse="\n")) +
      theme(legend.justification = c(0, 1),
            legend.position = "bottom")
   print(demo_map)
}

Better visualizing racial makeup and health insurance variation by charts.

# Race
pa_factors_race <- pa_factors %>% data.frame() %>%
   filter(grepl("race", variable, fixed = TRUE)) %>%
   dplyr::select(NAME, variable, estimate) %>%
   mutate(County = paste0(gsub(' County, Pennsylvania','',NAME)))

pa_factors_race_wide <- pa_factors_race %>% 
   dplyr::select(-NAME) %>%
   reshape(idvar = c("County"), timevar = "variable", direction = "wide") %>%
   dplyr::arrange(desc(estimate.racewhite), .by_group = TRUE) %>%
   dplyr::rename(black=estimate.raceblack, 
                 white=estimate.racewhite,
                 asian=estimate.raceasian,
                 amerind=estimate.raceamerind,
                 hispanic=estimate.racehispanic,
                 hpislander=estimate.racehawaii,
                 other=estimate.raceother)

head_pa_race_long <- melt(head(pa_factors_race_wide, n=34), id.vars=c("County"), 
                            measure.vars=c("white","black","asian","amerind","hispanic","hpislander","other"),
    variable.name="race", value.name="percent")

tail_pa_race_long <- melt(tail(pa_factors_race_wide, n=34), id.vars=c("County"), 
                            measure.vars=c("white","black","asian","amerind","hispanic","hpislander","other"),
    variable.name="race", value.name="percent")


top10 <- head_pa_race_long %>% 
   filter(variable == "white")%>%
   arrange(-desc(value)) %>%
   bind_rows(head_pa_race_long %>% filter(variable != "white")) %>%
   mutate(County = factor(County, unique(County))) %>%
   ggplot(aes(x=County, y=value, fill=forcats::fct_rev(variable))) +
   geom_bar(stat="identity", colour = "black") + 
   coord_flip() + 
   my_theme_bar() +
   theme(axis.title.x=element_blank()) +
   scale_y_continuous(name="% of population") + 
   labs(title="Fig. 11: Racial Composition of Pennsylvania by County",
        caption = "") +
   scale_fill_brewer(palette = "Set3", direction=-1) + 
   theme(legend.title=element_blank(), 
         legend.position="bottom",
         axis.title.x=element_blank(),
         axis.title.y=element_blank())

bottom10 <- tail_pa_race_long %>% 
   filter(variable == "white")%>%
   arrange(-desc(value)) %>%
   bind_rows(tail_pa_race_long %>% filter(variable != "white")) %>%
   mutate(County = factor(County, unique(County))) %>%
   ggplot(aes(x=County, y=value, fill=forcats::fct_rev(variable))) +
   geom_bar(stat="identity", colour = "black") + 
   coord_flip() + 
   my_theme_bar() +
   theme(axis.title.x=element_blank()) +
   scale_y_continuous(name="% of population") + 
   labs(title="",
        caption = "Source: US Census Bureau.") +
   scale_fill_brewer(palette = "Set3", direction=-1) + 
   theme(legend.title=element_blank(), 
         legend.position="bottom",
         axis.title.x=element_blank(),
         axis.title.y=element_blank())

mylegend<-g_legend(top10) #shared legend

grid.arrange(arrangeGrob(top10 + theme(legend.position="none"),
                         bottom10 + theme(legend.position="none"),
                         nrow=1),
             mylegend, nrow=2,heights=c(10, 1))

# Insurance
pa_noins <- pa_factors %>% data.frame() %>%
   filter(grepl("ins", variable, fixed = TRUE)) %>%
   dplyr::select(NAME, variable, estimate) %>%
   mutate(County = paste0(gsub(' County, Pennsylvania','',NAME))) %>% 
   filter(variable == "nohealthins")

pa_noins %>% 
   arrange(-desc(estimate)) %>%
   mutate(County = factor(County, unique(County))) %>%
   ggplot(aes(x=County, y=estimate)) +
   geom_bar(stat="identity", color="black", fill="darkorange") + 
   geom_abline(slope=0, intercept=5.9,  col = "darkgreen",lty=2) +
   geom_abline(slope=0, intercept=8.6,  col = "black",lty=2) +
   scale_y_continuous(name="% of population") +
   my_theme_bar() +
   labs(title=paste0(strwrap("Fig. 12: Percentage of Population Without Health Insurance in Pennsylvania by County",width=50),collapse="\n"),
        caption = "Surce: US Census Bureau") +
   theme(legend.title=element_blank(), 
         legend.position="bottom",
         axis.title.x=element_blank(),
         axis.text.x = element_text(angle = 75,  hjust=0.75)) +
   annotate("text", x = 20, y = 8.7, label = paste0(strwrap("8.6% of the US population did not have health insurance for all or part of 2020",width=50),collapse="\n")) +
      annotate("text", x = 20, y = 6.0, label = paste0(strwrap("5.9% of the PA population did not have health insurance for all or part of 2020",width=50),collapse="\n"))

Predicting ED wait time using multivariate analyses

Next, I run multivariate analyses to see if there are any relations between the socioeconomic/demographic variables and the TEC data. As mentioned, I first convert both to zip code by using zcta.

## Obtaining ZIP-CODE TABULATION AREA LEVEL socioeconomic/demographic data
varying <- c("edurate","unemploymentrate","medincome","nohealthins","povertyrate","racehispanic","racewhite","raceblack")

zc_factors <- get_acs(geography = "zip code tabulation area", 
              variables = c(povertyrate = "DP03_0119P",
                            medincome = "DP03_0062",
                            edurate = "DP02_0067P",
                            unemploymentrate = "DP03_0009P",
                            nohealthins = "DP03_0099P",
                            racewhite = "DP05_0077P",
                            raceblack = "DP05_0078P",
                            ethhispanic = "DP05_0071P"),
              state = 42, #PA is 42 
              year = 2019) %>% 
   dplyr::select(-moe) %>%
   as.data.frame() %>%
   reshape(idvar = c("GEOID","NAME"), timevar = "variable", direction = "wide")
   
## Obtaining ZIP-CODE TABULATION AREA LEVEL TEC data for hospital wait time (most populous field)

# Calculating assumption straight line 5 mile radius from each hospital
zc_tec_OP18b <- df_hosp_fac %>% filter(!is.na(OP_18b))
lat <- as.matrix(zc_tec_OP18b$Latitude)
long <- as.matrix(zc_tec_OP18b$Longitude)

# Since search_dataset takes forever to run, I first did this code, then saved down the dataset, and for subsequent runs and knits I pulled from the save down stataset
   # zc_tec_OP18b2 <- zc_tec_OP18b %>% 
   #    rowwise() %>% 
   #    mutate(CTD = list(search_radius(Latitude,Longitude,radius=5))) %>%
   #    mutate(ziplist=list(unlist(CTD[[1]])))
   # # Converting the lists of zipcodes into individual columns for each zip code
   # zc_tec_OP18b3 <- zc_tec_OP18b2 %>%
   #    as.data.frame() %>%
   #    mutate(ziplist2 = as.character(ziplist))
   # 
   # zc_tec_OP18b_final<- setDT(zc_tec_OP18b3)[, paste0("zipcode", 1:34) := tstrsplit(ziplist2, '", "')]
   # 
   # zipnames <- zc_tec_OP18b_final %>% 
   #    dplyr:: select(starts_with("zipcode"))
   # 
   # zc_tec_OP18b_final <- zc_tec_OP18b_final %>% 
   #    mutate_at(.vars = names(zipnames), 
   #             .funs = gsub,
   #             pattern = "[^[:alnum:] ]",
   #             replacement = "\\") %>%
   #    mutate_at(.vars = names(zipnames), 
   #             .funs = gsub,
   #             pattern = "c",
   #             replacement = "\\") %>%
   #    mutate_at(.vars=names(zipnames), .funs=as.character) %>%
   #    dplyr::select(Facility.Name, addr, Longitude, Latitude, OP_18b, names(zipnames))
   # 
   # # Saving dataset because search_radius takes some time to run
   # write.csv(zc_tec_OP18b_final, "zc_tec_OP18b_final.csv", row.names=F)

zc_tec_OP18b_final<- read.csv(file = "zc_tec_OP18b_final.csv")

# Wrangling from long to wide to use for regressions
forreg_zc_OP18b <- gather(zc_tec_OP18b_final, label, zipcode, zipcode1:zipcode34, factor_key=TRUE) %>%
   dplyr::select(-label) %>%
   arrange(zipcode) %>%
   na.omit %>%
   filter(!zipcode=="harater0") %>%
   filter(!as.numeric(zipcode)>40000) %>% #dropping ohio zipcodes
   filter(!as.numeric(zipcode)<10000) #dropping new jersey zipcodes

# Calculating mean wait time for zipcodes serviced by more than one hospital
forreg_zc_mean <- forreg_zc_OP18b %>% 
  group_by(zipcode) %>%
  summarise_at(vars(OP_18b), list(OP_18b_mean = mean))

# Lookup zcta for zipcodes using crosswalk dataset (zcta package)[https://github.com/jjchern/zcta]
zip_data <- zcta::zipzcta %>% filter(state=="PA")
zip_more <- zipcodeR::zip_code_db %>% filter(state=="PA")

forreg_zc <- forreg_zc_mean %>%
   left_join(zip_data, by = c("zipcode" = "zip")) %>%
   left_join(zc_factors, by=c("zcta" = "GEOID")) %>%
   left_join(zip_more, by=c("zipcode" = "zipcode"))

Correlation Matrix

Then, I conduct a correlation matrix analysis to determine if any of the socioeconomic/demographic variables are significantly correlated and need to be added as interaction variables to the regression model. Visualized using a correlogram.

r_forreg_zc <- forreg_zc %>% dplyr::select(zipcode, OP_18b_mean,population,population_density, estimate.edurate, estimate.unemploymentrate, estimate.povertyrate, estimate.medincome, estimate.nohealthins, estimate.racewhite, estimate.raceblack, estimate.ethhispanic)

M <- cor(r_forreg_zc[,unlist(lapply(r_forreg_zc, is.numeric))], method = "pearson", use = "complete.obs")

print("Fig. 13: Correlation Matrix of Socioeconomic/Demographic Variables")
## [1] "Fig. 13: Correlation Matrix of Socioeconomic/Demographic Variables"
corrplot(M, type = "upper", order = "hclust", tl.col = "black", tl.srt = 45, sig.level = 0.05, insig = "blank", method="color", addCoef.col = "white")

Linear Regression Model

Finally, I ran the regression model, adding an interaction variable for black and white race given significant correlation (-0.86).

# Linear Regression (lm and glm produce similar results)
regout <- lm(data = r_forreg_zc, OP_18b_mean ~  population_density + population + 
      estimate.edurate + estimate.unemploymentrate + estimate.povertyrate + estimate.medincome +
      estimate.nohealthins + estimate.ethhispanic + 
      estimate.racewhite*estimate.raceblack) #consider interaction factor

# Regression Summary
print("Table 5A. Regressions Results")
## [1] "Table 5A. Regressions Results"
print(summary(regout))
## 
## Call:
## lm(formula = OP_18b_mean ~ population_density + population + 
##     estimate.edurate + estimate.unemploymentrate + estimate.povertyrate + 
##     estimate.medincome + estimate.nohealthins + estimate.ethhispanic + 
##     estimate.racewhite * estimate.raceblack, data = r_forreg_zc)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -77.589 -19.752   0.059  18.309 128.704 
## 
## Coefficients:
##                                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                            2.517e+02  4.912e+01   5.124 4.24e-07 ***
## population_density                     8.331e-04  3.967e-04   2.100 0.036197 *  
## population                             2.884e-04  1.086e-04   2.655 0.008179 ** 
## estimate.edurate                       3.551e-01  3.393e-01   1.047 0.295788    
## estimate.unemploymentrate             -4.276e-01  4.624e-01  -0.925 0.355526    
## estimate.povertyrate                  -1.369e-01  2.491e-01  -0.549 0.582935    
## estimate.medincome                     6.828e-05  6.394e-05   1.068 0.286129    
## estimate.nohealthins                  -8.249e-01  3.733e-01  -2.209 0.027585 *  
## estimate.ethhispanic                  -4.264e-01  3.874e-01  -1.101 0.271459    
## estimate.racewhite                    -1.366e+00  3.569e-01  -3.827 0.000146 ***
## estimate.raceblack                    -1.262e+00  3.566e-01  -3.540 0.000437 ***
## estimate.racewhite:estimate.raceblack  3.427e-03  3.678e-03   0.932 0.351897    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 30.52 on 513 degrees of freedom
##   (52 observations deleted due to missingness)
## Multiple R-squared:  0.2297, Adjusted R-squared:  0.2132 
## F-statistic: 13.91 on 11 and 513 DF,  p-value: < 2.2e-16
print("Table 5B. Regressions Results Another Way - with Variance Inflation Factors")
## [1] "Table 5B. Regressions Results Another Way - with Variance Inflation Factors"
print(summ(regout, digits = 5, vifs = TRUE))
## MODEL INFO:
## Observations: 525 (52 missing obs. deleted)
## Dependent Variable: OP_18b_mean
## Type: OLS linear regression 
## 
## MODEL FIT:
## F(11,513) = 13.90828, p = 0.00000
## R² = 0.22972
## Adj. R² = 0.21320 
## 
## Standard errors: OLS
## --------------------------------------------------------------------------------------------------
##                                                    Est.       S.E.     t val.         p        VIF
## ------------------------------------------- ----------- ---------- ---------- --------- ----------
## (Intercept)                                   251.70463   49.12073    5.12420   0.00000           
## population_density                              0.00083    0.00040    2.10022   0.03620    1.95598
## population                                      0.00029    0.00011    2.65493   0.00818    1.40921
## estimate.edurate                                0.35509    0.33929    1.04658   0.29579    2.47897
## estimate.unemploymentrate                      -0.42761    0.46240   -0.92476   0.35553    1.59582
## estimate.povertyrate                           -0.13687    0.24911   -0.54945   0.58294    2.54141
## estimate.medincome                              0.00007    0.00006    1.06777   0.28613    1.88752
## estimate.nohealthins                           -0.82488    0.37334   -2.20944   0.02758    1.30818
## estimate.ethhispanic                           -0.42643    0.38735   -1.10089   0.27146    9.15153
## estimate.racewhite                             -1.36578    0.35692   -3.82661   0.00015   36.00332
## estimate.raceblack                             -1.26236    0.35660   -3.53996   0.00044   20.63162
## estimate.racewhite:estimate.raceblack           0.00343    0.00368    0.93176   0.35190    1.90993
## --------------------------------------------------------------------------------------------------
# Plotting coefficients
summs_title_noscale <- "Fig. 14A: Coefficient Estimates for Regression Model, Predicting ED Waiting Time at PA Hospitals"
plot_summs(regout, scale = FALSE, plot.distributions = TRUE, inner_ci_level = .95) +
   labs(title=paste0(strwrap(summs_title_noscale,width=50),collapse="\n"))

summs_title_scaled <- "Fig. 14B: Scaled Coefficient Estimates for Regression Model, Predicting ED Waiting Time at PA Hospitals"
plot_summs(regout, scale = TRUE, plot.distributions = TRUE, inner_ci_level = .95) +
   labs(title=paste0(strwrap(summs_title_scaled,width=50),collapse="\n"))

Checking model assumptions

print("Table 6. Checking Regression Model Assumptions")
## [1] "Table 6. Checking Regression Model Assumptions"
performance::check_model(regout, panel=TRUE)

Step-forward regression results

### Stepwise (step forward) regression, adding variables one by one in order of significance
print("Table 7. Step Forward Regression Results")
## [1] "Table 7. Step Forward Regression Results"
n<-ols_step_forward_p(regout)

Discussion

First, I investigated the variation of TEC metrics. Compared to national averages, patients in Pennsylvania, on average, spend more time waiting in the ED than the national average (Figures 1 and 2). Additionally, patients who go to the ED with signs of a possible heart attack or stroke, on average, were received timely care less often (Figures 3 and 4). Patients with chest pain in PA had to wait longer before transfer to a specialty hospital than the US national average (Figure 5). Though PA underperforms nationally, the regional graphs shows that PA performs the second best in the Northeast region (behind Vermont) with respect to general ED wait time for all patients (Fig. 6), but on par with the region for timely care for acute life-threatening symptoms (chest pain, stroke symptoms, Fig. 7).

PA has more than 160 hospitals spread across the state, and hospital time varies significantly by hospital. Tables 1 and 2 show the top 10 hospitals with the longest and shortest wait times for all patients, respectively. Notably, 3 of the top 10 longest wait times are in the Penn Medicine network (highlighted in yellow) Notably, psychiatric patients/patients required mental/behavioral health interventions spend longer waiting in EDs for care nationally, regionally and at the state level (Fig. 2, Fig. 6, Table 3).

As expected, the largest number of hospitals are clustered in the most densely populated counties, as highlighted in Figures 8 and 9. These figures show that all hospitals, major pediatric hospitals, stroke hospitals and trauma centers are cluster in more densely populated areas of the country, namely Alleghany county (which contains the city of Pittsburgh) and the east, especially Philadelphia county (which contains the city of Philadelphia).

Next, I investigated the variation of socioeconomic factors across PA counties. Table 4 highlights the mean socioeconomic and demographic characteristics for the state. In summary, Pennsylvania is overwhelmingly white and non-hispanic, with a median income of $57,056, a poverty rate of around 8%, unemployment rate around 5% and almost 6% of persons without health insurance.

Figures 10A-E thereafter highlight the variation across the state in these metrics, with Philadelphia consistently doing worse on most metrics and Alleghany county doing better (for example, a 9.2% unemployment rate in Philadelphia county vs. 4.8% in Alleghany vs. 5.0% in the state). Though Pennsylvania is overwhelmingly white, per Fig. 11, the capital city of Philadelphia is not and is in fact the most racially diverse city in the state. Fig. 12 highlights insurance coverage for PA counties, and notes that most counties had a smaller proportion of the population without insurance than nationwide.

Finally, I investigated the relationship between these socioeconomic/demographic factors (independent variables) and hospital waiting time (dependent variable), using a linear regression model. The model gave some interesting results. Tables 5A-B of the regression results show that population and population density have small but significant positive coefficients in this model, whilst percentage of population without health insurance have percentage of black and white populations have larger, negative and significant effects in this model. Note that Table 5B shows large VIFs for black and white race proportions, indicating some possible interaction/collinearity between these variables. The correlogram in Fig. 13 showed significant negative correlation between proportion of white vs. black residents (-0.86). For this purpose, I included these as interaction terms in the regression model.

The model parameters and check of the model as shown in Fig 14. Indicate that this model perhaps is not the best fit for the data. The model has some heteroskedasticity but the residuals do follow a normal distribution (Fig 14). Furthermore, the standard errors of many of the variables are large, and the adjusted-Rsquared of the model is less than 0.25. The step forward stepwise of the regression variables shows that, for this list of variables, including all of them produces the best model.

The regression model used is clearly flawed. The negative coefficient of proportion without health insurance was especially surprising and counterintuitive, and further work needs to be done to explain this relationship. Additionally, the intercept is very large, with a very small intercept, perhaps indicating that a lot of the effect on the waiting time is not captured by the independent variables.

Thus, while the model shows that there is perhaps some association between population, racial factors and health insurance status of the population with ED wait time, further work needs to investigate if other variables are also or more at play. A quick literature review indicates that though the characteristics of all comers play a role, characteristics specific to the ED visit also determine wait time, such as triage category, arrival time, age, day of the week and ED occupancy and suggest a model that focuses at the individual level. This research contributes to the literature by also focusing on system-level factors, recognizing that these do not tell the full story.

Assumptions/Limitations

  1. Assumption that only major pediatric hospitals see pediatric patients, which is not true. This underestimates access to pediatric care, but still exists as a proxy for access to high quality pediatric care.
  2. Assumption that only pediatric hospitals see pediatric patients, which is not true. This underestimates access to pediatric care, but still exists as a proxy for access to high quality pediatric care.
    Assumption of a 5-mile radius around each hospital with an ED, thereby assuming this is the service area for each hospital ED. This is inherently limiting given, for example, that patients are sometimes transported by helicopter for emergent care (eg. Via PennStar), and patients in rural areas may have to drive more than 5 miles. Thus, it may underestimate how far patients may travel for emergent care.
  3. Assumption that if a zipcode is covered by more than 1 hospital, the TEC metric used in the regression will be the average of those hospital’s metrics.

Conclusions:

  • Pennsylvania performs poorly in timely and effective ED care.
  • Access to care in Pennsylvania varies with location, and falls along population lines, racial lines and economic lines.
  • A model that focuses only on geographic-level socioeconomic and demographic factors explains some but not all of the reasons for long wait times in the ED, and most also account for visit-level/individual-level factors as well.

Acknowledgements

I would like to thank the following advisors, provided invaluable advice and guidance for data sources, mapping the data and assumptions for the regression analysis:
+ Dr. Gary Weissman, MD, MSHP (Assistant Professor of Medicine and Senior Fellow Leonard Davies Institute)
+ Dr. John Holmes, PhD, FACE, FACMI (Professor of Medical Informatics in Epidemiology, Associate Director of the Institute for Biomedical Informatics)

I would also like to thank my classmates in BMIN 503 for providing inspiration as well as valuable peer-peer

References

  • America’s Health Rankings analysis of America’s Health Rankings composite measure, United Health Foundation, AmericasHealthRankings.org, Accessed 2021.
  • Ballard, MD, PhD, MSPH, FACP, David J., ed. Achieving STEEEP Health Care: Baylor Health Care System’s Quality Improvement Journey. Productivity Press, 2019.
  • Elements of Access to Health Care. Content last reviewed June 2018. Agency for Healthcare Research and Quality, Rockville, MD. Accessed at https://www.ahrq.gov/research/findings/nhqrdr/chartbooks/access/elements.html
  • Kaiser Family Foundation. “The Pennsylvania Health Care Landscape.” (2016).
  • Kingery, Kate L. “County Health Rankings & Roadmaps.” Journal of Youth Development 13.3 (2018): 259-263. Rust, George, et al. “Practical barriers to timely primary care access: impact on adult use of emergency department services.” Archives of internal medicine 168.15 (2008): 1705-1710.
  • “Disrupting Disparities in Pennsylvania: Retooling for Geographic, Racial and Ethnic Growth.” College of Nursing and Health Professions, 5 Apr. 2021.